GH-36411: [Python] Use scikit-build-core as build backend for PyArrow and get rid of setup.py#49259
GH-36411: [Python] Use scikit-build-core as build backend for PyArrow and get rid of setup.py#49259raulcd wants to merge 20 commits intoapache:mainfrom
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@github-actions crossbow submit -g python |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit wheel-*-cp313-cp313-amd64 |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit wheel-windows-cp313-cp313-amd64 |
|
Revision: 166ff63 Submitted crossbow builds: ursacomputing/crossbow @ actions-f1bbdf4eaa
|
|
@github-actions crossbow submit wheel-windows-cp313-cp313-amd64 |
|
Revision: 36fefd4 Submitted crossbow builds: ursacomputing/crossbow @ actions-bd811d95ea
|
|
@github-actions crossbow submit -g python -g wheel |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit python-sdist |
|
Revision: c518c90 Submitted crossbow builds: ursacomputing/crossbow @ actions-ea249e92ce
|
|
@github-actions crossbow submit python-sdist |
|
Revision: 131e2c0 Submitted crossbow builds: ursacomputing/crossbow @ actions-1c58e78aff
|
|
@github-actions crossbow submit python-sdist |
|
Revision: 30e04be Submitted crossbow builds: ursacomputing/crossbow @ actions-3179f97956
|
|
@github-actions crossbow submit -g python -g wheel |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit wheel-windows-cp313-cp313-amd64 |
|
Revision: 5606749 Submitted crossbow builds: ursacomputing/crossbow @ actions-3c4293e220
|
…tings cmake.build-type usage
…ngs cmake.args usage
…s and update usage. Also document --config-settings cmake.build-type
cdd7629 to
1110dbe
Compare
|
@github-actions crossbow submit -g python -g wheel |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit wheel-cp313 |
|
Revision: 75d2f7f Submitted crossbow builds: ursacomputing/crossbow @ actions-fda9c59efb |
|
@github-actions crossbow submit -g python -g wheel |
|
Revision: 75d2f7f Submitted crossbow builds: ursacomputing/crossbow @ actions-62c2fce38f |
WillAyd
left a comment
There was a problem hiding this comment.
Nice work - some small comments but generally this lgtm
| - Number of processes used to compile PyArrow’s C++/Cython components | ||
| - ``''`` | ||
|
|
||
| Note that ``pip install`` uses ``--config-settings`` (plural) while |
There was a problem hiding this comment.
I don't remember this nuance from the meson-python branch. Looks like there we just use -C to pass arguments to the front-end, whether its pip or build. Maybe we should use that here and avoid this altogether?
There was a problem hiding this comment.
that's not a bad tip. Thanks @WillAyd ! Let me try to see if -C works for both! I found it pretty annoying that the API was different.
| try: | ||
| yield | ||
| finally: | ||
| if sys.platform != "win32": |
There was a problem hiding this comment.
So these edits are made to the source tree directly right? No way to do this in an isolated build location?
There was a problem hiding this comment.
I am unsure how we could manage those. We want to do non isolated builds to build against specific numpy/pandas versions, mandating isolated builds creates complexity on environments and dependency management. If we try to copy ourselves on a different location to build we are also creating a bunch of complexity for a really simple thing, having licenses copied! Do you have any suggestions on what I could try?
To be fair I've found the whole licenses / notice and how build backends manages them to be too tight and inflexible leaving cases like ours on a difficult spot. I have been thinking on just copying the files but maintenance is a burden just for keeping them in sync. Maybe we should copy them and have a CI check that validates they are in-sinc?
I feel like a really simple build backend that just copies the data, either by moving soft-links to hard links or, in the case of Windows, copying the files is the simplest solution.
There was a problem hiding this comment.
Yea I agree with your overall sentiment - sadly its a lot of effort to maintain these two files :-)
I think the meson-python approach was pretty good, where a dist script copied these into the source distribution at the time it was being made. I'm wondering if scikit-build-core has a similar hook
There was a problem hiding this comment.
From what I understand from scikit-build-core when building sdist we can't use an out of source tree. Basically the sdist is copying the files that are under Path() (.):
https://github.com/scikit-build/scikit-build-core/blob/49b26c49fb4bc2a94177e4ef27c79e6389375710/src/scikit_build_core/build/sdist.py#L153-L161
There's no config setting where we could use an out of source sdist where we could copy files.
There was a problem hiding this comment.
Have you raised the issue on the scikit-build-core issue tracker?
…her we use build or pip
|
@github-actions crossbow submit -g python -g wheel |
|
Revision: 0ada5d9 Submitted crossbow builds: ursacomputing/crossbow @ actions-5f312d0d53 |
| run: | | ||
| gem install test-unit openssl | ||
| pip install "cython>=3.1" setuptools pytest requests setuptools-scm | ||
| pip install build "cython>=3.1" pytest requests scikit-build-core setuptools-scm |
There was a problem hiding this comment.
Can we rely on pyproject.toml for build dependencies at some point?
| set PYARROW_BUILD_TYPE=%CMAKE_BUILD_TYPE% | ||
| set PYARROW_BUILD_VERBOSE=1 | ||
| set PYARROW_BUNDLE_ARROW_CPP=ON | ||
| set PYARROW_CMAKE_GENERATOR=%CMAKE_GENERATOR% |
There was a problem hiding this comment.
Does scikit-build-core pick up the CMAKE_xxx environment variables automatically?
There was a problem hiding this comment.
Based on other changes you did, it doesn't seem it does?
| shutil.copy2(parent_license, pyarrow_license) | ||
| else: | ||
| # For Unix-like systems we replace the symlink with | ||
| # a hardlink to avoid copying the file content. |
There was a problem hiding this comment.
Is it important to do this? We might make the code simpler by copying on all platform.
| try: | ||
| yield | ||
| finally: | ||
| if sys.platform != "win32": |
There was a problem hiding this comment.
Have you raised the issue on the scikit-build-core issue tracker?
| # as it's the most common build type for users building from source. | ||
| # This is mainly relevant for our Windows wheels, which are built with | ||
| # Visual Studio and thus use a multi-config generator with Release. | ||
| # As a note this is only to populate config_internal.h.cmake. |
There was a problem hiding this comment.
Hmm, it's not possible to populate config_internal.h.cmake separately for each config?
| pyarrow = ["*.pxd", "*.pyi", "*.pyx", "includes/*.pxd", "py.typed"] | ||
| [tool.scikit-build.cmake.define] | ||
| PYARROW_BUNDLE_ARROW_CPP = {env = "PYARROW_BUNDLE_ARROW_CPP", default = "OFF"} | ||
| PYARROW_BUNDLE_CYTHON_CPP = {env = "PYARROW_BUNDLE_CYTHON_CPP", default = "OFF"} |
There was a problem hiding this comment.
Did you check that those were actually bundled in the wheels?
| [tool.scikit-build.cmake.define] | ||
| PYARROW_BUNDLE_ARROW_CPP = {env = "PYARROW_BUNDLE_ARROW_CPP", default = "OFF"} | ||
| PYARROW_BUNDLE_CYTHON_CPP = {env = "PYARROW_BUNDLE_CYTHON_CPP", default = "OFF"} | ||
| PYARROW_GENERATE_COVERAGE = {env = "PYARROW_GENERATE_COVERAGE", default = "OFF"} |
There was a problem hiding this comment.
At some point we should check whether this flag still works. Eons ago it was used to generate codecov reports, but we ditched all the corresponding scripting that had to merge coverage files generated during the C++ test suite and the Python test suite.
Can you open an issue for that?
Rationale for this change
Move our PyArrow build backend from setuptools and a custom setup.py to scikit-build-core which is just build backend for CMake related projects.
What changes are included in this PR?
Move from setuptools to scikit-build-core and remove PyArrow setup.py. Update some of the build requirements and minor fixes.
A custom build backend has been also been created in order to wrap scikit-build-core in order to fix problems on License files for monorepos.
pyproject.toml metadata validation expects license files to exist before exercising the build backend that's why we create symlinks. Our thin build backend will just make those symlinks hard-links in order for license and notice files to contain the contents and be added as part of the sdist.
Remove flags that are not used anymore (were only part of setup.py) and documented and validated how the same flags have to be used now.
Are these changes tested?
Yes all Python CI tests, wheels and sdist are successful.
Are there any user-facing changes?
Yes, users building PyArrow will now require the new build dependencies to exercise the build and depending on the flags used they might require to use the new documented way of using those flags.